Single haplotype assembly of the human genome from a hydatidiform mole.

نویسندگان

  • Karyn Meltz Steinberg
  • Valerie A Schneider
  • Tina A Graves-Lindsay
  • Robert S Fulton
  • Richa Agarwala
  • John Huddleston
  • Sergey A Shiryev
  • Aleksandr Morgulis
  • Urvashi Surti
  • Wesley C Warren
  • Deanna M Church
  • Evan E Eichler
  • Richard K Wilson
چکیده

A complete reference assembly is essential for accurately interpreting individual genomes and associating variation with phenotypes. While the current human reference genome sequence is of very high quality, gaps and misassemblies remain due to biological and technical complexities. Large repetitive sequences and complex allelic diversity are the two main drivers of assembly error. Although increasing the length of sequence reads and library fragments can improve assembly, even the longest available reads do not resolve all regions. In order to overcome the issue of allelic diversity, we used genomic DNA from an essentially haploid hydatidiform mole, CHM1. We utilized several resources from this DNA including a set of end-sequenced and indexed BAC clones and 100× Illumina whole-genome shotgun (WGS) sequence coverage. We used the WGS sequence and the GRCh37 reference assembly to create an assembly of the CHM1 genome. We subsequently incorporated 382 finished BAC clone sequences to generate a draft assembly, CHM1_1.1 (NCBI AssemblyDB GCA_000306695.2). Analysis of gene, repetitive element, and segmental duplication content show this assembly to be of excellent quality and contiguity. However, comparison to assembly-independent resources, such as BAC clone end sequences and PacBio long reads, indicate misassembled regions. Most of these regions are enriched for structural variation and segmental duplication, and can be resolved in the future. This publicly available assembly will be integrated into the Genome Reference Consortium curation framework for further improvement, with the ultimate goal being a completely finished gap-free assembly.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Genome-wide definitive haplotypes determined using a collection of complete hydatidiform moles.

We present genome-wide definitive haplotypes, determined using a collection of 74 Japanese complete hydatidiform moles, each carrying a genome derived from a single sperm. The haplotypes incorporate 281,439 common SNPs, genotyped with a high throughput array-based oligonucleotide hybridization technique. Comparison of haplotypes inferred from pseudoindividuals (constructed from randomized mole ...

متن کامل

Which Is More Prominent in Recurrent Hydatidiform Mole, Ovum or Sperm?

Recurrent hydatidiform mole is defined as episodes of two molar pregnancies in a female. Often, complete moles onlyderive androgenic nuclear genome. We described two cases with repeated molar pregnancies attempted to preventfuture episodes by performing intracytoplasmic sperm injection (ICSI) and preimplantation genetic diagnosis (PGD)to assess genetic disorders. The first pat...

متن کامل

D-HaploDB: a database of definitive haplotypes determined by genotyping complete hydatidiform mole samples

The Definitive Haplotype Database (D-HaploDB) is a web-accessible resource of genome-wide definitive haplotypes determined from a collection of Japanese complete hydatidiform moles (CHMs), each of which carries a genome derived from a single sperm. Currently, the database contains genotypes for 281 439 common SNPs from 74 CHMs which were determined by a high-throughput array-based oligonucleoti...

متن کامل

Characterization of structural variants with single molecule and hybrid sequencing approaches

MOTIVATION Structural variation is common in human and cancer genomes. High-throughput DNA sequencing has enabled genome-scale surveys of structural variation. However, the short reads produced by these technologies limit the study of complex variants, particularly those involving repetitive regions. Recent 'third-generation' sequencing technologies provide single-molecule templates and longer ...

متن کامل

Evaluation of Haplotype Inference Using Definitive Haplotype Data Obtained from Complete Hydatidiform Moles, and Its Significance for the Analyses of Positively Selected Regions

The haplotype map constructed by the HapMap Project is a valuable resource in the genetic studies of disease genes, population structure, and evolution. In the Project, Caucasian and African haplotypes are fairly accurately inferred, based mainly on the rules of Mendelian inheritance using the genotypes of trios. However, the Asian haplotypes are inferred from the genotypes of unrelated individ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Genome research

دوره 24 12  شماره 

صفحات  -

تاریخ انتشار 2014